-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat] Inter-batch-parellelism #8700
Conversation
Hello @tchaton! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2021-08-10 13:31:14 UTC |
for more information, see https://pre-commit.ci
Codecov Report
@@ Coverage Diff @@
## master #8700 +/- ##
========================================
- Coverage 93% 43% -49%
========================================
Files 169 170 +1
Lines 14068 14146 +78
========================================
- Hits 13038 6138 -6900
- Misses 1030 8008 +6978 |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
return | ||
|
||
|
||
class LightningFetcher: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n00b question: certain classes are prefixed with Lightning
while others aren't.
Components with name:
- LightningModule, LightningDataModule, LightningLoggerBase
- without: Trainer, Callbacks, Logger, Profiler
utilities like this don't contain logic specific to the rest of the framework, so I wonder if we could call this just Prefetcher?
it can also help users feel like they're not needing to learn lightning-specific things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose DataFetcher
.
|
||
for _ in range(self.num_prefetch_batches + 1): | ||
if not done: | ||
with torch.cuda.stream(cuda_stream): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this validate somewhere that torch.cuda is available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We validate it here as it is only supported on GPU devices https://github.com/PyTorchLightning/pytorch-lightning/blob/inter_batch_parallism/pytorch_lightning/trainer/trainer.py#L1344
self, | ||
dataloader, | ||
batch_to_device: Callable, | ||
profiler: "pl.profiler.base.BaseProfiler", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could the profiler be optional? the lightning trainer would always set this, but this becomes a handy utility for users outside the project too (it might even draw them into the project)
This class is used to perform ``pre-fetching`` for the ``train`` dataloader | ||
and apply inter batch parallelism if enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could/should be used for evaluation too to still overlap the forward with host to device transfer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, added in fault tolerant training PRs :)
What does this PR do?
Fixes #8316
Does your PR introduce any breaking changes? If yes, please list them.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃